Data Story

The data (Bondora's loan book) can be download from: https://www.bondora.com/marketing/media/LoanData.zip


In [25]:
import matplotlib
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
%matplotlib inline

import warnings
warnings.filterwarnings("ignore", category=np.VisibleDeprecationWarning)

pd.options.display.max_rows = 125

In [26]:
import seaborn as sns
sns.set(color_codes=True)
sns.set(rc={"figure.figsize": (16, 4)})

In [64]:
loandata = pd.read_csv("data/loandata.csv", low_memory=False)
loandata['year'] = pd.to_datetime(loandata['ListedOnUTC']).dt.year
loandata['yearmonth'] = pd.to_datetime(loandata['ListedOnUTC']).dt.to_period('M')
recentld = loandata[loandata['year'] > 2012]
repaid = recentld[recentld['Status'] == 'Repaid']

In [28]:
(loandata.shape, recentld.shape)


Out[28]:
((35513, 114), (32779, 114))

Number of loans per year


In [29]:
countByYear = loandata.groupby('year').size()

In [30]:
plot = sns.barplot(x=countByYear.index,y=countByYear)


From the initial analysis we can see that the number of loans is definitely growing over time. This can be caused by a higher demand for loans or rise in popularity of Bondora.

Median salary per year and country


In [32]:
t = loandata[['year', 'IncomeTotal', 'Country']]
t = t[(t['year'] > 2010) & (t['year'] < 2017)]

In [33]:
plot = t.groupby(['year', 'Country']).median().unstack(1).plot(kind='bar', figsize=(16, 4))


We can see that, generally, the income of the borrowers increases over time. This is an expected behaviour as the countries, where Bondora operates, have seen an increase of average salary over the last years.

Loan amount analysis


In [34]:
plot = sns.distplot(loandata['Amount'].astype(int), bins=50)


List of the top 30 loan amount (round to nearest 100) with counts:


In [35]:
plot = (loandata['Amount'] // 100 * 100).value_counts().head(30).plot(kind='bar')


The most common loan amount is 500 EUR with the first 13 being lower or equal to 3100 EUR.

Distributions of loan amounts over years


In [36]:
plot = sns.violinplot(cut=0, scale="width", x="year", y="Amount", data=loandata[['Amount', 'year']])


In the first couple of years the loans were relatively much lower then in the last years. Average, minimum and maximum loan amounts increase over time.

Distribution of loan amounts per country


In [37]:
plot = sns.violinplot(cut=0, scale="width", x="Country", y="Amount", data=loandata[['Amount', 'Country']])


Finland has the highest most frequent loan amount (about 2100 EUR) and Estonia the lowest (about 500 EUR). The shapes of the distrubtions are similar across all the countries.

Loan duration analysis


In [38]:
pd.options.mode.chained_assignment = None  # default='warn'
t = loandata[['Amount', 'LoanDuration']]
t['LoanDuration2'] = t['LoanDuration'] // 12 * 12

In [39]:
plot = sns.distplot(loandata['LoanDuration'], bins=50) # remove density


Loan duration with relation to the amount


In [40]:
plot = sns.violinplot(cut=0, scale="width", x="LoanDuration2", y="Amount", data=t)


There is a visible linear dependency between the amount borrowed and loan duration -- the longer the loan the higher amount borrowed.

Loan duration with relation to year of issue


In [41]:
plot = sns.violinplot(cut=0, scale="width", x="year", y="LoanDuration", data=loandata[['year', 'LoanDuration']])


Over the first three years Bondora issued loans of maximum 24 months duration, but since 2013 the maximum duration is 60 months. We can see that the most popular durations in the recent years are 36 and 60 months with very few borrowers choosing durations lower than 12 months.


In [65]:
plot = sns.violinplot(cut=0, scale="width", x="year", y="LoanDuration", data=repaid[['year', 'LoanDuration']])


Number of dependants vs age


In [42]:
p = loandata[['Age', 'NrOfDependants']]
p['DepNum'] = pd.to_numeric(loandata.NrOfDependants, errors='coerce')
plot = p.groupby('NrOfDependants').size().sort_values().plot(kind='bar')


More than half of the borrowers have no dependants at all with very few borrowers have more than 5 dependants.


In [70]:
p = p.dropna().astype(int)
grid = sns.lmplot(x="Age", y="NrOfDependants", data=p, fit_reg=False, size=6, aspect=3)


We can see a non linear dependency between the age of the borrower and number of the dependants, gradually increasing from the age of 18, reaching peak between 40-45, and then gradually decreasing.

Number of loans listed per year month


In [44]:
loandata['yearmonth'] = pd.to_datetime(loandata['ListedOnUTC']).dt.to_period('M')

In [45]:
plot = loandata.groupby(['yearmonth', 'Country']).size().unstack(1).sort_index(ascending=True).fillna(0).plot(figsize=(16, 5))


From the analysis of the loans listed per yearmonth, it is clearly visible, that Slovakian loans were listed only for a short period of time (mostly 2014) and since then borrowing from that country has been phased out.

Distribution of loan amounts for genders


In [46]:
plot = sns.violinplot(cut=0, scale="width", x="Amount", y="Gender", orient='h', data=recentld)



In [47]:
df = recentld
m = df[df['Gender'] == 0.0]
f = df[df['Gender'] == 1.0]
(m.shape, f.shape)


Out[47]:
((17369, 114), (13501, 114))

In [48]:
v = 'Amount'

m_mean = m[v].dropna().mean()
f_mean = f[v].dropna().mean()
std = df[v].dropna().std()
z = (m_mean - f_mean) / std
(m_mean, f_mean, std, m_mean - f_mean, z)


Out[48]:
(2745.6839196269216,
 2631.183838234205,
 2137.562338978778,
 114.50008139271677,
 0.05356572732630538)

H0: No difference between mean loan amount for female borrowers and male borrowers. H1 - there is a difference.

Historical repayment rate of principal and amount of interest with penalties


In [53]:
repaid['Defaulted'] = repaid['PrincipalPaymentsMade'] < repaid['Amount'] 
repaid[['Defaulted', 'PrincipalPaymentsMade', 'InterestAndPenaltyPaymentsMade', 'Amount', 'Interest']]


Out[53]:
Defaulted PrincipalPaymentsMade InterestAndPenaltyPaymentsMade Amount Interest
7 False 1000.00 403.89 1000.0 61.49
8 False 4000.00 1567.36 4000.0 31.01
10 False 2000.00 575.44 2000.0 17.11
11 False 530.00 130.03 530.0 25.68
13 False 6900.00 3.74 6900.0 21.63
18 False 3720.00 512.48 3720.0 26.94
29 False 1500.00 204.25 1500.0 15.91
49 False 3590.00 14.59 3590.0 21.29
53 False 3720.00 331.42 3720.0 31.12
54 False 2655.00 670.46 2655.0 34.57
56 False 2605.00 132.15 2605.0 18.12
58 False 2430.00 538.27 2430.0 31.12
72 False 500.00 45.46 500.0 44.01
74 False 635.00 3.73 635.0 29.95
92 False 500.00 93.26 500.0 20.77
93 False 2500.00 313.81 2500.0 20.60
94 False 8505.00 1785.97 8505.0 24.52
111 False 6160.00 1705.82 6160.0 26.64
119 False 1000.00 57.37 1000.0 16.11
121 False 1000.00 118.33 1000.0 61.49
124 True 1184.65 495.51 1455.0 57.09
126 False 3485.00 265.92 3485.0 32.85
143 False 5475.00 1809.86 5475.0 27.98
147 False 2000.00 193.64 2000.0 20.12
148 False 1000.00 229.65 1000.0 24.47
154 False 10000.00 360.77 10000.0 26.37
164 False 4500.00 1014.53 4500.0 17.95
165 False 1000.00 151.58 1000.0 25.84
169 False 1500.00 76.85 1500.0 23.44
170 False 1500.00 479.93 1500.0 35.87
171 False 1600.00 716.74 1600.0 26.78
173 False 500.00 78.36 500.0 32.14
176 True 2553.31 599.17 3500.0 17.12
177 False 4000.00 1086.96 4000.0 43.43
180 False 3300.00 1082.85 3300.0 31.12
186 False 2655.00 86.76 2655.0 28.06
187 False 2550.00 854.39 2550.0 29.04
193 False 455.00 66.76 455.0 57.68
198 False 3500.00 949.79 3500.0 28.08
199 False 3000.00 850.38 3000.0 24.09
200 False 2000.00 946.55 2000.0 35.60
202 False 2000.00 267.69 2000.0 40.09
203 False 1000.00 130.43 1000.0 54.17
208 False 1000.00 306.97 1000.0 45.22
211 False 1000.00 717.64 1000.0 40.40
214 False 2100.00 455.23 2100.0 16.22
219 False 3190.00 453.88 3190.0 43.34
225 False 500.00 27.45 500.0 19.66
248 False 600.00 109.49 600.0 17.23
249 False 865.00 94.73 865.0 39.66
253 False 500.00 118.48 500.0 20.24
254 False 1000.00 315.33 1000.0 53.15
264 False 2000.00 393.43 2000.0 27.56
282 False 1595.00 178.82 1595.0 23.73
294 False 1595.00 223.95 1595.0 40.40
298 False 2120.00 406.33 2120.0 40.03
300 False 3080.00 498.14 3080.0 31.38
313 False 2175.00 2226.78 2175.0 71.97
314 False 1755.00 105.29 1755.0 31.12
319 False 3000.00 448.97 3000.0 19.77
326 False 5000.00 923.96 5000.0 21.06
330 False 1910.00 16.21 1910.0 28.07
... ... ... ... ... ...
35161 False 6375.00 882.12 6375.0 15.56
35166 False 4000.00 1073.77 4000.0 26.76
35170 False 6500.00 1878.73 6500.0 32.49
35171 False 1000.00 88.08 1000.0 14.57
35172 False 600.00 204.00 600.0 31.12
35173 False 1000.00 161.70 1000.0 26.75
35175 False 1595.00 253.52 1595.0 40.96
35180 False 2000.00 275.86 2000.0 22.69
35181 False 500.00 60.67 500.0 32.14
35200 False 1500.00 6.42 1500.0 17.27
35207 False 3190.00 598.56 3190.0 23.54
35215 True 479.69 216.83 600.0 34.58
35227 False 3055.00 1.43 3055.0 16.04
35228 False 1150.00 784.21 1150.0 73.73
35232 False 800.00 456.79 800.0 31.12
35236 False 3000.00 333.81 3000.0 14.29
35240 False 1315.00 58.65 1315.0 55.49
35250 False 5000.00 543.35 5000.0 26.83
35254 False 600.00 48.15 600.0 26.87
35266 False 4355.00 183.44 4355.0 44.53
35274 False 900.00 0.00 900.0 25.74
35275 False 3080.00 514.51 3080.0 28.03
35280 False 1800.00 175.12 1800.0 29.13
35293 False 1000.00 204.62 1000.0 20.23
35309 False 10000.00 3711.53 10000.0 30.46
35329 False 5000.00 1138.27 5000.0 36.43
35330 False 2125.00 156.45 2125.0 14.92
35331 False 1740.00 876.10 1740.0 47.05
35334 False 1200.00 162.77 1200.0 31.12
35349 False 1900.00 205.99 1900.0 28.85
35358 False 800.00 41.15 800.0 31.12
35359 False 4000.00 1199.31 4000.0 23.33
35360 False 500.00 150.14 500.0 44.91
35365 False 1000.00 149.21 1000.0 21.62
35367 False 530.00 141.16 530.0 41.61
35368 False 1200.00 39.99 1200.0 46.63
35377 False 2000.00 205.71 2000.0 24.87
35379 False 3000.00 10.39 3000.0 24.89
35387 False 1200.00 257.69 1200.0 15.51
35390 False 500.00 423.87 500.0 47.52
35393 False 2000.00 153.81 2000.0 16.55
35396 False 7500.00 2858.18 7500.0 21.59
35399 False 1000.00 92.81 1000.0 15.95
35401 False 1500.00 593.73 1500.0 61.49
35403 False 4000.00 413.64 4000.0 19.66
35409 False 600.00 109.27 600.0 47.80
35435 False 1000.00 48.00 1000.0 28.21
35438 False 10000.00 1383.23 10000.0 22.67
35442 False 2500.00 638.05 2500.0 16.76
35443 False 3000.00 506.63 3000.0 31.12
35445 False 500.00 44.25 500.0 15.88
35447 False 1060.00 49.27 1060.0 29.39
35455 False 600.00 72.06 600.0 19.35
35456 False 500.00 73.51 500.0 31.07
35462 False 1000.00 81.13 1000.0 25.56
35465 False 6155.00 5.22 6155.0 35.82
35471 False 2700.00 712.95 2700.0 24.99
35472 False 500.00 96.29 500.0 40.40
35477 False 1000.00 6.72 1000.0 22.37
35483 False 1600.00 240.09 1600.0 34.98
35484 False 1000.00 204.11 1000.0 46.92
35510 False 1485.00 186.41 1485.0 64.51

7167 rows × 5 columns


In [59]:
print(repaid.shape)
print(repaid['Defaulted'].mean())


(7167, 115)
0.0293009627459

In [63]:
print(repaid['PrincipalPaymentsMade'].sum() / repaid['Amount'].sum())
print(repaid['InterestAndPenaltyPaymentsMade'].sum() / repaid['Amount'].sum())
print((repaid['PrincipalPaymentsMade'].sum() + repaid['InterestAndPenaltyPaymentsMade'].sum()) / repaid['Amount'].sum())


0.9859871652215287
0.23997421529014298
1.2259613805116716

In [ ]: